perf: bulk text block scanner bypasses fastparse per-line overhead by He-Pin · Pull Request #689 · databricks/sjsonnet

He-Pin · 2026-04-05T08:13:29Z

Motivation

Text blocks (||| syntax) are parsed line-by-line through fastparse, which incurs per-line combinator overhead for each newline. Programs with large text blocks (templates, embedded configs) pay this cost unnecessarily.

Key Design Decision

Implement a bulk scanner that directly scans for the text block terminator (|||) using a simple character loop, bypassing the fastparse per-line combinator overhead entirely. The scanner processes the entire text block in a single pass.

Modification

Add bulk text block scanning in the parser
Directly scan for ||| terminator without per-line fastparse dispatch
Preserve exact text block semantics (whitespace stripping, indentation)

Benchmark Results

JMH (JVM, 3 iterations warmup + 3 measurement)

Benchmark	Master (ms/op)	This PR (ms/op)	Change
bench.02	50.427 ± 38.9	45.838 ± 6.9	-9.1%
comparison2	85.854 ± 188.7	70.746 ± 12.3	-17.6%
realistic2	73.458 ± 66.7	69.255 ± 4.0	-5.7%

Analysis

The improvement is modest but consistent across all benchmarks. The benefit will be larger for programs with many or large text blocks. Since parsing is typically a small fraction of total eval time, the -5.7% to -17.6% range is expected.

References

Upstream: jit branch experiment

Result

All 46 tests pass. All benchmarks positive, no regressions.

Replace the per-line fastparse combinator loop in tripleBarStringBody with a custom bulk scanner that directly accesses the underlying String data. For a 600KB text block with ~8000 lines, this eliminates ~8000 intermediate String allocations and the Seq[String] + mkString join overhead. Key changes: - tripleBarStringBodyBulk: Custom scanner using IndexedParserInput.data for zero-copy StringBuilder.append(CharSequence, start, end) instead of fastparse's repX combinator which creates one String per line. - Hybrid approach: first line still uses fastparse for proper error messages, subsequent lines use the bulk scanner. - constructString: Skip string interning for strings >1024 chars (avoids expensive hashCode computation on 600KB strings), single-string fast path, pre-sized StringBuilder for multi-line blocks. - Falls back to original fastparse path for non-IndexedParserInput. JMH large_string_template: 2.251 → 1.762 ms/op (-21.7%) Native large_string_template: ~37% faster Upstream: explored in he-pin/sjsonnet jit branch

He-Pin mentioned this pull request Apr 5, 2026

perf: escape-free string rendering fast path with bulk copy #678

Open

He-Pin marked this pull request as ready for review April 5, 2026 08:31

He-Pin mentioned this pull request Apr 5, 2026

performance optimization #666

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bulk text block scanner bypasses fastparse per-line overhead#689

perf: bulk text block scanner bypasses fastparse per-line overhead#689
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/text-block-bulk-scanner

He-Pin commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

Modification

Benchmark Results

JMH (JVM, 3 iterations warmup + 3 measurement)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Apr 5, 2026 •

edited

Loading